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ABSTRACT 

There  has  been  renewed  interest  in  the  analysis  of  Hit/Miss  or  Bernoulli  data  in  the  context  of 
Nondestructive  Evaluation  (NDE).  Many  contributions  have  been  made  with  a  focus  on  confidence 
bound  estimation  on  this  type  of  data.  Some  concern  regarding  the  proper  calculation  and  use  of  ago  and 
a 90/95  estimates  has  been  raised.  In  particular,  the  behavior  of  a  Probability  of  Detection  (POD)  curve 
for  large  flaw  sizes  above  stated  a9o  estimates  has  been  of  concern.  This  paper  will  give  a  brief 
historical  overview  of  the  analysis  of  hit/miss  data,  and  propose  a  remedy  regarding  recent  concerns. 
The  end  result  will  be  a  procedure  that  helps  avoid  pitfalls  in  the  analysis  of  hit/miss  data. 


KEYWORDS 

Probability  of  Detection,  hit/miss,  Bernoulli  data,  Bayesian  methods 

INTRODUCTION 

Studies  of  the  reliability  of  Nondestructive  Inspections  (NDI)  began  with  the  United  States  Air 
Force  (USAF)  in  the  1960’s  (Packman  et  ah,  1968)  and  continued  during  the  1970’s  with  major  efforts 
conducted  by  the  National  Aeronautics  and  Space  Administration  (NASA)  (Ruinmel  et  ah,  1974)  and 
the  USAF  (Lewis  et  ah,  1978).  Initial  efforts  to  explain  the  data  from  these  inspection  studies  used 
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binomial  statistics  (Rummel,  1982).  In  the  1980’s,  Berens  and  Hovey  developed  parametric  methods  to 
analyze  both  hit/miss  and  signal  response  data  (Berens  et  al,  1981)  and  (Berens,  1989).  This  provided 
the  foundation  for  guidance  for  POD  studies  that  has  been  adopted  by  the  Department  of  Defense 
(DoD),  and  has  been  published  as  a  2nd  edition  of  a  handbook  (MIL-HDBK-1823A,  2009).  Other 
agencies  and  industries  have  also  made  use  of  this  guidance  (Gandossi  et  al.,  2010)  and  (Drury  et  al., 
2006).  It  should  also  be  noted  that  efforts  in  medical  statistics  are  quite  similar  to  the  methods  currently 
used  for  NDE  (Collett,  2002).  In  medical  statistics,  the  tenn  effective  dose  is  equivalent  to  a5o,  and 
lethal  dose  is  equivalent  to  ago. 

CURRENT  METHODS  OF  ANALYSIS  FOR  HIT/MISS  DATA 

The  current  accepted  methods  used  to  analyze  hit/miss  data  are  similar  to  the  ones  published  in 
an  overview  paper  on  the  topic  (Berens,  1989).  In  that  work,  the  confidence  bounds  were  global,  that  is 
they  were  calculated  to  apply  to  the  entire  POD  curve.  Later  it  was  decided  that  this  approach  was 
overly  conservative,  and  it  was  sufficient  to  apply  confidence  bounds  for  each  flaw  size  locally  (Berens, 
2000).  The  likelihood  ratio  method  was  also  introduced  which  provided  more  accurate  confidence 
bounds  for  hit/miss  data  (Annis  et  al.,  2007)  and  (Harding  et  al.,  2003).  These  two  developments  were 
incorporated  in  the  second  edition  of  MIL-HDBK-1823.  Recently,  concerns  have  been  raised  about  the 
current  approach,  and  binomial  methods  have  once  again  been  proposed  to  mitigate  concerns  about  the 
behavior  of  POD  for  flaw  sizes  greater  than  the  established  a  90  estimates  generated  by  parametric 
approaches  (Generazio,  2011).  Further  work  on  nonparametric  methods  has  been  done  by  Spencer  with 
the  assumption  that  the  POD  is  a  continuous  non-decreasing  function  of  flaw  size  (Spencer,  2011). 

In  the  authors’  opinion,  there  is  a  valid  concern  raised  in  (Generazio,  2011),  and  that  is  that  one 
could  analyze  hit/miss  data  that  doesn’t  meet  the  requirements  of  MIL-HDBK-1823 A  and  still  generate 
an  estimate  for  a9o/95  that  could  be  misused.  Another  concern  in  (Generazio,  2011)  is  the  assumption  of 


Approved  for  public  release;  distribution  unlimited. 


2 


a  monontonically  increasing  POD  function.  Proper  3-point  calibration  (Rummel,  2005)  should  mitigate, 
at  least  in  part,  concerns  about  monotonicity. 

A  REMEDY  FOR  CONCERNS  ABOUT  HIT/MISS  ANALYSIS 

It  is  indeed  possible  to  generate  estimates  of  a9o/95  for  hit/miss  data  that  can  be  misleading.  In 
this  paper,  we  propose  a  Bayesian  approach  that  involves  3  and  4  parameter  models  that  will  provide  a 
remedy  for  this  problem.  Generally,  specific  statistical  expertise  should  be  consulted  to  avoid  errors  in 
POD  studies. 

First,  the  idea  of  using  more  than  2  parameters  for  the  POD  model  has  been  proposed  in  the  past 
(Moore  et  ah,  2001).  Additional  parameters  are  added  for  2  reasons:  1)  the  evidence  doesn’t  support  the 
POD  curve  converging  to  one  for  large  flaw  sizes,  and  2)  better  accounting  of  false  call  rates  such  that 
the  POD  curve  doesn’t  converge  to  0  for  small  flaw  sizes.  The  difficulty  with  conventional  methods  is 
in  the  estimation  of  the  parameters.  The  Bayesian  methods  proposed  in  this  work  facilitate  easy 
estimation  of  the  model  parameters,  a  simple  way  to  compute  confidence  bounds,  and  a  systematic 
method  for  model  selection  based  on  the  Bayes  factor. 

A  detailed  description  of  Bayesian  methods  is  beyond  the  scope  of  this  paper,  but  an  overview  of 
how  it  can  be  applied  in  an  NDE  context  is  given  in  (Thompson,  2010),  and  all  the  mathematical  details 
can  be  found  in  (Christensen,  2010).  A  very  simple  statement  of  a  problem  posed  in  Bayesian  terms  is 
shown  in  Figure  1. 

The  posterior  is  a  probability  density  function  that  literally  reads  as  the  probability  of  model 
parameters  9,  given  the  data.  The  likelihood  is  also  a  probability  density  function  that  is  read  as  the 
probability  of  the  data  given  assumptions  about  the  model  parameters.  The  prior  can  be  noninfonnative 
as  will  be  the  case  for  this  work,  or  it  can  represent  expert  opinion  or  information  from  previous 
experiments.  The  nonnalizing  constant  or  marginal  likelihood  in  the  denominator  is  actually  integration 
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over  the  numerator.  Most  people  are  familiar  with  Bayes  rule,  but  there  is  a  computational  challenge  in 
calculating  the  nonnalizing  constant  or  evidence  as  it  is  commonly  called  in  the  literature.  Calculating 
the  evidence  requires  a  high  dimensional  integration  that  is  typically  perfonned  via  sampling  methods 
such  as  Markov  Chain  Monte  Carlo  (MCMC). 

COMPARISON  OF  BAYESIAN  APPROACH  WITH  STANDARD  METHODS 

Before  applying  3  and  4  parameter  models  to  difficult  data  sets  in  (Generazio,  2011),  Bayesian 
methods  are  illustrated  on  data  from  Berens  seminal  paper  on  the  topic  (Berens,  1989).  For  hit/miss 
analysis  the  data  is  assumed  to  follow  a  Bernoulli  distribution.  Either  logit  or  probit  models  can  be  used. 


y,  ~  Bernoulli^,) 

(1) 

exp(b0  +hj  ln(a(.)) 

1  +  exp(b0  +  bx  ln(a,. )) 

(logit) 

(2) 

P,  =  ^(ho  +  bt  ln(a.)) 

(probit) 

(3) 

The  traditional  methods  for  estimating  parameters  bo  and  b\  are  already  established,  and  the 
Bayesian  methods  will  be  compared  with  standard  cases.  Once  again,  noninformative  priors  will  be 
used  in  this  process,  and  MCMC  sampling  will  be  used  to  extract  infonnation  about  the  parameters.  The 
data  provided  in  (Berens,  1989)  and  displayed  in  Figure  7  of  that  work  is  used  as  a  benchmark.  Figure  2 
displays  the  POD  curves  which  appear  identical  to  the  ones  using  the  traditional  analysis  methods.  An 
alternative  fonn  of  the  logit  model  is  given  in  equation  4,  and  the  parameters  fi  and  a  are  related  to  the 
parameters  in  equation  2,  by  equations  5  and  6.  A  comparison  summary  of  the  results  is  shown  in  Table 
1. 


Pi  =  1 1  +  exp 


M  = 


n 

VT 

-b, 


lna  -  ju 


<j 


(4) 

(5) 


Approved  for  public  release;  distribution  unlimited. 


4 


n 


(6) 


cr  = 


Inspector 

Parameter 

A 

B 

C 

Composite 

0.96/0.96 

1.11/1.11 

0.82/0.82 

0.96/0.96 

o 

0.59/0.59 

1.04/1.05 

0.87/0.89 

0.88/0.85 

Table  1.  Original  analysis/Bayesian  methods 

There  are  no  substantial  differences  in  the  results  for  the  2  parameter  models.  The  reason  this 
Bayesian  approach  is  introduced  is  because  the  inference  needed  for  3  and  4  parameter  models  is  quite 
difficult  using  traditional  methods.  The  sampling  methods  in  the  Bayesian  approach  are  just  as  easy  to 
implement  for  the  3  and  4  parameter  models  as  they  are  for  the  2-parameter  model. 

ADDITIONAL  PARAMETERS  IN  HIT/MISS  MODELS 

The  motivation  for  adding  additional  parameters  to  the  model  is  to  better  represent  false  call 
rates  for  small  flaw  sizes,  and  to  better  represent  the  tail  behavior  of  the  POD  curve  for  large  flaw  sizes. 
Equations  7  and  8  show  the  form  of  a  3-  parameter  model  with  a  lower  asymptote  for  the  logit  and  probit 
models  respectively.  Figure  3a  depicts  this  and  the  value  of  a  can  be  thought  of  as  a  measure  of  the 
false  call  rate.  Equations  9  and  10  show  the  form  of  a  3-parameter  model  with  an  upper  asymptote. 
Figure  3b  depicts  this  3 -parameter  model.  The  />'  term  is  a  measure  of  the  probability  of  missing  flaws  as 
the  flaw  size  goes  to  infinity.  The  4-parameter  model  has  both  terms  in  it  and  is  described  in  equations 
11  and  12,  and  depicted  in  Figure  3c.  Proper  estimation  of  the  upper  asymptote  is  very  important  for 
addressing  pitfalls  in  POD  analysis. 


Pi  =«  +  (!-«)■ 


exp(h0  +  \  log(a,  )) 


1  +  exp(h0  +bx  log(o,  )) 
Pi  =  a  +  (1  -  a)  •  ®(h0  +  bx  log(a,)) 


(logit) 

(probit) 


(V) 

(8) 
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_o  CXp(/)„  +  /),  logfi/,)) 

1  +  exp(b0  +  bx  log(a,.)) 


(logit) 


(9) 


p,=  [3-  4>(fc0  +  bt  log(a,)) 

(probit) 

(10) 

/,,=«  +  (/?-«)•  s^b°+b^a» 

1  +  exp(b0  +  bx  log(a,)) 

(logit) 

(ID 

pi=a  +  (p-a)- O  (b0  +  bx  log(a,.)) 

(probit) 

(12) 

If  fl  is  smaller  than  0.9,  then  that  means  ago  and  ago/95  simply  do  not  exist,  even  if  the  2  parameter  model 
provides  estimates.  So  there  is  a  choice  between  4  models:  1)  2-parameter  model,  2)  3-parameter  model 
with  lower  asymptote  parameter  a,  3)  3 -parameter  model  with  upper  asymptote  parameter  /?,  and  4)  4- 
parameter  model.  Each  of  these  can  use  either  the  logit  or  probit  link  function,  so  that  makes  a  total  of  8 
models.  The  determination  of  which  model  is  appropriate  for  a  given  data  set  is  detennined  by  looking 
at  the  ratio  of  marginal  likelihoods  a.k.a.  the  Bayes  factor. 

AVOIDING  PITFALLS  IN  HIT/MISS  ANALYSIS 

The  combination  of  the  3  and  4  parameter  models  with  Bayesian  methods  are  powerful  tools  that 
can  be  used  to  avoid  pitfalls  in  the  analysis  of  hit/miss  data.  The  process  is  simply  to  calculate  the 
marginal  likelihood  for  each  of  the  8  possible  models  and  to  determine  which  model  is  most  appropriate. 
If  the  3  or  4  parameter  model  is  selected,  it  is  very  possible  that  ago  or  a 90/95  might  not  exist  even  though 
inference  with  the  2-parameter  model  provides  estimate.  Let’s  look  at  an  example  that  appeared  in 
(Generazio,  2011).  There  was  a  particular  set  of  data  that  the  author  identified  as  a  “Case  2  data  set 
D8001(3)L.”  It  was  reported  that  the  “Logit-ML”  ago/95  value  was  12.95  mm.  The  latest  version  of 
MIL-HDBK-1823  was  not  cited  in  the  paper,  so  this  analysis  could  have  been  done  with  older  software 
with  a  known  deficiency,  which  was  corrected  by  implementing  the  likelihood  ratio  method  for 
confidence  bound  calculation  (Annis  et  ah,  2007).  The  value  obtained  for  390/95  using  the  likelihood 
ratio  method  is  22  mm  and  many  warnings  are  displayed  throughout  the  calculation  (Annis,  2012). 
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Since  there  are  many  misses  for  large  flaw  sizes  in  this  data  set,  it  most  likely  doesn’t  meet  the 
requirements  to  be  analyzed  according  to  the  procedures  set  forth  in  (MIL-HDBK-1823A,  2009).  For 
this  case,  the  data  is  analyzed  using  8  potential  models.  The  2-parameter,  3 -parameter  with  upper 
bound,  3-parameter  with  lower  bound,  and  the  4-parameter  model  will  all  be  examined  for  both  the  logit 
and  probit  models.  The  marginal  likelihoods  for  each  are  listed  in  Table  1 . 


Model  type 

logit 

probit 

Bayes  factor 

logit/probit 

2-parameter 

1.1806e-87 

2.7022e-88 

4.3690 

3 -parameter  lower  bound 

2.9183e-89 

4.21 19e-89 

0.6929 

3 -parameter  upper  bound 

6.1396e-83 

8.28e-83 

0.7415 

4-parameter 

9.2848e-83 

4.0047e-81 

0.0232 

Table  1.  Marginal  likelihoods  and  Bayes  factors  for  each  possible  model 

The  ratio  of  evidence  or  marginal  likelihoods  for  competing  models  is  used  to  select  the  best 
model.  This  ratio  is  known  as  the  Bayes  factor.  The  marginal  likelihood  for  an  individual  model  doesn’t 
provide  useful  information,  but  the  relative  marginal  likelihoods  do.  Note  that  all  the  models  with  upper 
asymptotes  have  far  higher  marginal  likelihoods  than  models  without,  so  all  evidence  suggests  that  an 
upper  bound  is  necessary  for  this  data  set.  Also  note,  that  the  Bayes  factor  is  only  greater  than  1  for  the 
2-parameter  case.  Since  this  is  the  ratio  logit/probit,  this  implies  that  the  probit  model  better  fits  the  data 
for  all  models  except  the  2-parameter  model.  The  4-parameter  probit  model  has  the  highest  marginal 
likelihood  compared  to  all  other  possibilities,  so  it  is  the  best  choice  out  of  all  the  models.  It  turns  out 
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that  ago  and  a9o/95  do  not  exist  for  this  data  set.  The  lower  asymptote  is  0.2657,  and  the  upper  asymptote 
is  0.8574.  This  basically  indicates  that  the  false  call  rate  is  quite  high,  and  that  the  POD  on  average  is 
0.8574  as  flaw  size  goes  to  infinity.  Figure  4  displays  the  4-parameter  probit  model  which  is  most 
appropriate  for  this  data  set. 


SUMMARY  AND  CONCLUSIONS 

The  proper  analysis  of  hit/miss  inspection  data  is  still  very  important  since  this  type  of  data  is 
still  widely  collected  for  POD  studies.  A  brief  historical  survey  of  the  analysis  of  this  type  of  data  in  the 
context  of  POD  was  given  and  accompanied  by  recent  advances.  Some  concerns  related  to  the  statistical 
analysis  and  design  of  POD  studies  were  listed  and  addressed.  In  particular,  the  potential  to  determine 
estimates  of  a9o  or  390/95  with  conventional  2-parameter  models  when  the  data  do  not  support  their 
existence  was  examined.  An  approach  using  more  parameters  in  the  logit  or  probit  model  to  adequately 
describe  the  tail  behavior  of  POD  curves  was  presented.  Bayesian  methods  can  be  used  to  determine 
parameter  estimates  for  these  more  complicated  models.  Finally  the  Bayes  factor  is  used  to  determine 
which  model  is  most  suitable,  and  should  mitigate  concerns  about  improperly  estimating  ago  and  ago/95. 
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•  Prior  -  Belief  about  the  statistical  parameters  or  expert  opinion. 

•  Evidence  -  Normalizing  constant  or  marginal  likelihood  that  is  important  for  model  selection 
purposes. 

•  Posterior  -  Integration  of  experimental  data  and  beliefs,  expert  opinion,  and  physics  based 
models 

•  Likelihood  -  Probability  that  the  data  is  generated  from  statistical  model  with  parameters 
Figure  1:  Diagram  of  Bayesian  approach  used  for  statistical  inference. 
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Probability  of  detection 


Figure  2.  Data  from  (Berens,  1989)  analyzed  using  Bayesian  approach  with  noninformative  priors. 
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POD 


Figure  3.  POD  model  options:  (a)  3-parameter  with  lower  bound,  (b)  3-parameter  with  upper  bound, 
and  (c)  4-parameter  with  lower  and  upper  bound. 
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probability  of  detection 
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Figure  4.  Probit  4-parameter  model  that  best  fits  data  set  D8001(3)L. 
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